This notebook contains does some basic data visualizations for booking data from the Fulton County jail data from the beginning of 2017 to the present (November 26th).


In [6]:
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns

In [7]:
time_columns = ['inmate_dob',
               'booking_timestamp',
               'release_timestamp',
               'court_date']

index_col = 'inmate_id'

In [9]:
scrape1 = pd.read_csv("fulton_2017-11-26_09-15-47.csv", 
                     parse_dates=time_columns,
                     index_col=index_col)
scrape2 = pd.read_csv("fulton_2017-11-26_10-34-04.csv",
                     parse_dates=time_columns,
                     index_col=index_col)

Change the above cell to refer to the file locations on your computer (The reason it is two files is that I encountered a previously unseen error halfway through, and had to put a new try/except into the code and restart the scraping).


In [10]:
len(scrape1)


Out[10]:
11976

In [11]:
len(scrape2)


Out[11]:
11594

In [12]:
df = pd.concat([scrape1,scrape2])

In [13]:
len(df)


Out[13]:
23570

In [15]:
df.columns


Out[15]:
Index(['county_name', 'timestamp', 'url', 'inmate_lastname',
       'inmate_firstname', 'inmate_middlename', 'inmate_sex', 'inmate_race',
       'inmate_age', 'inmate_dob', 'inmate_address', 'booking_timestamp',
       'release_timestamp', 'processing_numbers', 'agency', 'facility',
       'charges', 'severity', 'bond_amount', 'current_status', 'court_date',
       'days_jailed', 'other', 'notes'],
      dtype='object')

In [16]:
df['days_jailed'] = df.release_timestamp - df.booking_timestamp

In [17]:
df['days_jailed_np'] = df.days_jailed.dt.days

In [25]:
df.loc[df['days_jailed_np']>7,'days_jailed_np'] = 7

In [26]:
sns.distplot(df['days_jailed_np'].dropna())


Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x294aebdac50>

This gives us the overall distribution of time imprisoned for everyone in our dataset who has been released.


In [19]:
df.groupby('inmate_race').agg({'days_jailed_np' : np.mean}).plot(kind='bar')


Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x294ae599588>

This gives us mean time in prison by race.


In [20]:
ax= sns.violinplot(data=df, x='inmate_race', y='days_jailed_np', cut=0, scale='width')
for tick in ax.get_xticklabels():
    tick.set_rotation(45)


This is a violin plot, which gives us a breakdown of how the distribution of days in jail varies by race. Unfortunately I can't figure out how to set the category labels nicely.